Unsupervised WSD with a Dynamic Thesaurus*
نویسندگان
چکیده
Diana McCarthy et al. (ACL-2004) obtain the predominant sense for an ambiguous word based on a weighted thesaurus of words related to the ambiguous word. This thesaurus is obtained using Dekang Lin’s (COLING-ACL1998) distributional similarity method. Lin averages the distributional similarity by the whole training corpus; thus the list of words related to a given word in his thesaurus is given for a word as type and not as token, i.e., does not depend on a context in which the word occurred. We observed that constructing a list similar to Lin’s thesaurus but for a specific context converts the method by McCarthy et al. into a word sense disambiguation method. With this new method, we obtained a precision of 69.86%, which is even 7% higher than the supervised baseline.
منابع مشابه
Word Sense Disambiguation with Spreading Activation Networks Generated from Thesauri
Most word sense disambiguation (WSD) methods require large quantities of manually annotated training data and/or do not exploit fully the semantic relations of thesauri. We propose a new unsupervised WSD algorithm, which is based on generating Spreading Activation Networks (SANs) from the senses of a thesaurus and the relations between them. A new method of assigning weights to the networks’ li...
متن کاملSemantic Distances for Sets of Senses and Applications in Word Sense Disambiguation
There has been an increasing interest both from the Information Retrieval community and the Data Mining community in investigating possible advantages of using Word Sense Disambiguation (WSD) for enhancing semantic information in the Information Retrieval and Data Mining process. Although contradictory results have been reported, there are strong indications that the use of WSD can contribute t...
متن کاملHIT-CIR: An Unsupervised WSD System Based on Domain Most Frequent Sense Estimation
This paper presents an unsupervised system for all-word domain specific word sense disambiguation task. This system tags target word with the most frequent sense which is estimated using a thesaurus and the word distribution information in the domain. The thesaurus is automatically constructed from bilingual parallel corpus using paraphrase technique. The recall of this system is 43.5% on SemEv...
متن کاملFrom Predicting Predominant Senses to Local Context for Word Sense Disambiguation
Recent work on automatically predicting the predominant sense of a word has proven to be promising (McCarthy et al., 2004). It can be applied (as a first sense heuristic) to Word Sense Disambiguation (WSD) tasks, without needing expensive hand-annotated data sets. Due to the big skew in the sense distribution of many words (Yarowsky and Florian, 2002), the First Sense heuristic for WSD is often...
متن کاملClass Based Sense Definition Model for Word Sense Tagging and Disambiguation
We present an unsupervised learning strategy for word sense disambiguation (WSD) that exploits multiple linguistic resources including a parallel corpus, a bilingual machine readable dictionary, and a thesaurus. The approach is based on Class Based Sense Definition Model (CBSDM) that generates the glosses and translations for a class of word senses. The model can be applied to resolve sense amb...
متن کامل